Skip to content

Anirud/add all inf server model types#636

Draft
anirudTT wants to merge 8 commits intodevfrom
anirud/add-all-inf-server-model-types
Draft

Anirud/add all inf server model types#636
anirudTT wants to merge 8 commits intodevfrom
anirud/add-all-inf-server-model-types

Conversation

@anirudTT
Copy link
Collaborator

No description provided.

…ndling

- Added a new function `_sync_model_catalog` in `run.py` to regenerate the model catalog from the inference server artifact, improving model management.
- Updated `ContainersView` to include model status from the catalog, enhancing the response data structure.
- Introduced a mapping for backend model types to frontend constants, allowing for better integration and display of model types in the UI.
- Refactored frontend components to utilize the new model type and status information, improving user experience and clarity in model deployment status.
- Enhanced the `FirstStepForm` to group models by status, providing a clearer overview of model compatibility and deployment readiness.
- Introduced `display_model_type` to the `ModelImpl` class for better representation of model types.
- Updated `ContainersView` to include `display_model_type` in the response data structure.
- Enhanced model synchronization logic to incorporate `display_model_type` from the inference server.
- Refactored frontend components to group models by `display_model_type`, improving clarity in model selection and compatibility visualization.
- Updated `SelectionSteps` interface to include `display_model_type` for consistent data handling across components.
- Added `device_id` parameter to `wait_for_frontend_and_open_browser` and updated related functions to support device-specific model deployment.
- Introduced a new `deployment_store.py` for thread-safe JSON file storage of model deployment records, replacing the previous Django ORM model.
- Updated `run_container` function to handle device-specific deployments and maintain a pending record for in-progress deployments.
- Enhanced `DeployView` to accept `device_id` for model deployment requests, improving flexibility in deployment configurations.
- Refactored Docker-related configurations in `docker-compose` files for better readability and maintainability.
- Added new `VoicePipelineView` for handling voice processing workflows, integrating STT, LLM, and TTS functionalities.
- Updated model type configurations to include new types for TTS and VLM, enhancing model management capabilities.
- Improved container matching logic in `update_deploy_cache` to prioritize exact name matches for model implementations, with a fallback to longest substring matches.
- Added `messages_to_prompt` function to convert chat messages into a plain text prompt for model requests.
- Implemented `get_model_name_from_container` to query the vLLM API for the exact model name loaded in a container, enhancing model identification.
- Updated `InferenceView` and `AgentView` to utilize the new model name retrieval and message formatting functions, improving data handling for model requests.
- Refactored service route determination in `map_service_route` to consider model capabilities, ensuring appropriate routing for chat and completion models.
- Enhanced error handling in streaming functions to log HTTP errors more effectively, improving debugging capabilities.
- Changed the path for mounting workflow logs in the Docker Compose file to point to the new artifacts directory, ensuring proper access to deployment logs for the inference server.
- Introduced new endpoints for device state and reset functionality in the backend, allowing for unified device state retrieval and reset operations.
- Updated `SystemResourceService` to include methods for extracting device state and telemetry data, improving the accuracy of device status reporting.
- Refactored frontend components to utilize the new device state context, enhancing the user interface with real-time device status updates and improved error handling.
- Implemented a reset dialog in the frontend to manage device resets, providing users with clear feedback during the reset process.
- Updated routing to include new device state and reset paths, ensuring seamless integration with existing API structures.
- Enhanced error handling and logging throughout the device management process for better debugging and user experience.
- Introduced a new JSON file `models_from_inference_server.json` containing detailed configurations for 60 models, including their names, types, device configurations, inference engines, and environment variables.
- Each model entry includes metadata such as version, docker image, service routes, and parameter counts, enhancing the model management capabilities within the inference server.
- This addition supports improved integration and deployment of various model types, including chat, speech recognition, and image generation.
- Added an exception for the new JSON file `models_from_inference_server.json` to ensure it is tracked by Git, facilitating better management of model configurations within the inference server.
@anirudTT anirudTT marked this pull request as draft February 26, 2026 18:43
@rfatimaTT rfatimaTT marked this pull request as ready for review March 3, 2026 19:27
@anirudTT anirudTT marked this pull request as draft March 10, 2026 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant